15 research outputs found
AMERICANO: Argument Generation with Discourse-driven Decomposition and Agent Interaction
Argument generation is a challenging task in natural language processing,
which requires rigorous reasoning and proper content organization. Inspired by
recent chain-of-thought prompting that breaks down a complex task into
intermediate steps, we propose Americano, a novel framework with agent
interaction for argument generation. Our approach decomposes the generation
process into sequential actions grounded on argumentation theory, which first
executes actions sequentially to generate argumentative discourse components,
and then produces a final argument conditioned on the components. To further
mimic the human writing process and improve the left-to-right generation
paradigm of current autoregressive language models, we introduce an argument
refinement module which automatically evaluates and refines argument drafts
based on feedback received. We evaluate our framework on the task of
counterargument generation using a subset of Reddit/CMV dataset. The results
show that our method outperforms both end-to-end and chain-of-thought prompting
methods and can generate more coherent and persuasive arguments with diverse
and rich contents
Scientific Opinion Summarization: Meta-review Generation with Checklist-guided Iterative Introspection
Opinions in the scientific domain can be divergent, leading to controversy or
consensus among reviewers. However, current opinion summarization datasets
mostly focus on product review domains, which do not account for this
variability under the assumption that the input opinions are non-controversial.
To address this gap, we propose the task of scientific opinion summarization,
where research paper reviews are synthesized into meta-reviews. To facilitate
this task, we introduce a new ORSUM dataset covering 10,989 paper meta-reviews
and 40,903 paper reviews from 39 conferences. Furthermore, we propose the
Checklist-guided Iterative Introspection (CGI) approach, which breaks down
the task into several stages and iteratively refines the summary under the
guidance of questions from a checklist. We conclude that (1) human-written
summaries are not always reliable since many do not follow the guidelines, and
(2) the combination of task decomposition and iterative self-refinement shows
promising discussion involvement ability and can be applied to other complex
text generation using black-box LLM
Multimedia Generative Script Learning for Task Planning
Goal-oriented generative script learning aims to generate subsequent steps to
reach a particular goal, which is an essential task to assist robots or humans
in performing stereotypical activities. An important aspect of this process is
the ability to capture historical states visually, which provides detailed
information that is not covered by text and will guide subsequent steps.
Therefore, we propose a new task, Multimedia Generative Script Learning, to
generate subsequent steps by tracking historical states in both text and vision
modalities, as well as presenting the first benchmark containing 5,652 tasks
and 79,089 multimedia steps. This task is challenging in three aspects: the
multimedia challenge of capturing the visual states in images, the induction
challenge of performing unseen tasks, and the diversity challenge of covering
different information in individual steps. We propose to encode visual state
changes through a selective multimedia encoder to address the multimedia
challenge, transfer knowledge from previously observed tasks using a
retrieval-augmented decoder to overcome the induction challenge, and further
present distinct information at each step by optimizing a diversity-oriented
contrastive learning objective. We define metrics to evaluate both generation
and inductive quality. Experiment results demonstrate that our approach
significantly outperforms strong baselines.Comment: 21 pages, Accepted by Findings of the Association for Computational
Linguistics: ACL 2023, Code and Resources at
https://github.com/EagleW/Multimedia-Generative-Script-Learnin
Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization
Recent pre-trained language models (PLMs) achieve promising results in
existing abstractive summarization datasets. However, existing summarization
benchmarks overlap in time with the standard pre-training corpora and
finetuning datasets. Hence, the strong performance of PLMs may rely on the
parametric knowledge that is memorized during pre-training and fine-tuning.
Moreover, the knowledge memorized by PLMs may quickly become outdated, which
affects the generalization performance of PLMs on future data. In this work, we
propose TempoSum, a novel benchmark that contains data samples from 2010 to
2022, to understand the temporal generalization ability of abstractive
summarization models. Through extensive human evaluation, we show that
parametric knowledge stored in summarization models significantly affects the
faithfulness of the generated summaries on future data. Moreover, existing
faithfulness enhancement methods cannot reliably improve the faithfulness of
summarization models on future data. Finally, we discuss several
recommendations to the research community on how to evaluate and improve the
temporal generalization capability of text summarization models.Comment: Accepted at EMNLP 202